Combined learning methods and mining complex data
نویسنده
چکیده
Machine Learning has shown tremendous progress in the last decades. Started as a sub-domain within Artificial Intelligence it has grown into an independent field of study. New research problems have been identified, many new methods have been introduced and the number of their applications in various areas is still increasing. Machine Learning together with the related, younger field of Data Mining have become powerful tools for knowledge discovery in databases and intelligent data analysis. One of the most popular tasks in learning is supervised learning, in particular discovering classification knowledge from data, which usually leads to creation of classifiers. The classifiers could be constructed in different ways, the most common ones being decision trees, decision rules, Bayesian approaches, instance based learning, artificial neural networks, discriminant function or support vector machines. Nevertheless, most of both past and present research concerns developing single learning algorithms. However, according to both theoretical (e.g. “no free lunch theorem”) considerations as well as experimental experiences one cannot expect of one single algorithm to be simply the best. Each reasonable algorithm has its own area of superiority and it cannot outperform others in all possible learning problems. Overcoming limitations of single algorithms and improving predictive accuracy could be achieved by integrating several diversified classifiers into a combined system. Such systems, known under the names ensembles or multiple classifiers, have received a noticeable research and application interest at least since the late 80’s. As a result, several different approaches for generating component classifiers and aggregating their predictions have been proposed. The most influential works include Bagging and Boosting, which manipulate learning examples. However, solving difficult problems involves also changing feature representations and integrating classifiers learned from different feature subsets. Multiple classifiers are also used in the context of learning from dynamic data streams, where the target classes may change over time. Furthermore, for some problems hybrid classifiers are suitable, which use multi-strategic learning that fits the data with different representations. Another critical issue concerns observation that most of learning or data mining algorithms have been developed for stable, tabular representations of data. Such tabular data are present in many software systems and they are often easily obtained from relational databases – commonly met in any application domain. Moreover, data coming from other sources are often transformed into this format; see, e.g., processing text documents with natural language techniques or extracting numerical features from images. However, in some domains this data model appears to be too restrictive. Many modern automatic systems in science, engineering, medical or social fields are able to collect larger data with increasing their structure. This growing complexity comes both from the need for getting richer and more precise descriptions of real world objects and from development of new technologies for their measuring or collecting. For example, such complex data may appear in biology, bio-informatics, analysis of heterogeneous representations/files inside the patient electronic record, mining graph structure, industrial
منابع مشابه
Credit Card Fraud Detection using Data mining and Statistical Methods
Due to today’s advancement in technology and businesses, fraud detection has become a critical component of financial transactions. Considering vast amounts of data in large datasets, it becomes more difficult to detect fraud transactions manually. In this research, we propose a combined method using both data mining and statistical tasks, utilizing feature selection, resampling and cost-...
متن کاملEvaluating machine learning methods and satellite images to estimate combined climatic indices
The reflections recorded on satellite images have been affected by various environmental factors. In these images, some of these factors are combined with other environmental factors that cannot be distinguished. Therefore, it seems wise to model these environmental phenomena in the form of hybrid indicators. In this regard, satellite imagery and machine learning methods can play a unique role ...
متن کاملCombined Mining Approach to Generate Patterns for Complex Data
In Data mining applications, which often involve complex data like multiple heterogeneous data sources, user preferences, decision-making actions and business impacts etc., the complete useful information cannot be obtained by using single data mining method in the form of informative patterns as that would consume more time and space, if and only if it is possible to join large relevant data s...
متن کاملUsing Combined Descriptive and Predictive Methods of Data Mining for Coronary Artery Disease Prediction: a Case Study Approach
Heart disease is one of the major causes of morbidity in the world. Currently, large proportions of healthcare data are not processed properly, thus, failing to be effectively used for decision making purposes. The risk of heart disease may be predicted via investigation of heart disease risk factors coupled with data mining knowledge. This paper presents a model developed using combined descri...
متن کاملPattern Generation for Complex Data Using Hybrid Mining
Combined mining is a hybrid mining approach for mining informative patterns from single or multiple data-sources, multiple-features extraction and applying multiple-methods as per the requirements. Data mining applications often involve complex data like multiple heterogeneous data sources, different user preference and create decision-making actions. The complete useful information may not be ...
متن کاملLearning FCM by Data Mining in a Purchase System
Fuzzy Cognitive Maps (FCMs) have successfully been applied in numerous domains to show the relations between essential components in complex systems. In this paper, a novel learning method is proposed to construct FCMs based on historical data and by using meta-heuristic: Genetic Algorithm (GA), Simulated Annealing (SA), and Tabu Search (TS). Implementation of the proposed method has demonstrat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Intell. Data Anal.
دوره 16 شماره
صفحات -
تاریخ انتشار 2012